Introduction to Mixed Models

Eva Freyhult

NBIS, SciLifeLab

June 11, 2025

Simple linear regression

Simple linear regression

\[ y_i = \beta_0 + \beta_1 x_i + \varepsilon_i \]

  • \(y_i\): outcome, dependent variable
  • \(x_i\): predictor, independent variable
  • \(\beta_0\): intercept
  • \(\beta_1\): slope
  • \(\varepsilon_i\): error term

Assumptions

  • Linearity: The relationship between x and y is linear
  • Independence: Observations are independent

Assumptions

  • Linearity: The relationship between x and y is linear
  • Independence: Observations are independent
  • Homoscedasticity: Constant variance of residuals

Assumptions

  • Linearity: The relationship between x and y is linear
  • Independence: Observations are independent
  • Homoscedasticity: Constant variance of residuals
  • Normality: Residuals are normally distributed

Example: Association between age and plasma concentration

Example: Association between age and plasma concentration

Example: Association between age and plasma concentration

m0 <- lm(conc ~ age, data = df)
summary(m0)

Call:
lm(formula = conc ~ age, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-42.492 -12.643  -0.968   9.998  50.981 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 126.7136    10.0424  12.618 3.68e-15 ***
age           1.3124     0.2619   5.011 1.28e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 19.55 on 38 degrees of freedom
Multiple R-squared:  0.3979,    Adjusted R-squared:  0.3821 
F-statistic: 25.11 on 1 and 38 DF,  p-value: 1.282e-05

Example: Association between age and plasma concentration

Are the assumptions fulfilled?

Example: Association between age and plasma concentration

Add group as a variable.

m1 <- lm(conc ~ age + group, data = df)
summary(m1)

Call:
lm(formula = conc ~ age + group, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-16.849  -8.185   0.786   6.618  17.656 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  303.4541    15.9759  18.994  < 2e-16 ***
age           -1.9022     0.3042  -6.254 3.58e-07 ***
group2      -110.9344     9.7480 -11.380 2.59e-13 ***
group3       -82.1424     7.4540 -11.020 6.29e-13 ***
group4       -44.7618     5.3530  -8.362 7.33e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 9.24 on 35 degrees of freedom
Multiple R-squared:  0.8762,    Adjusted R-squared:  0.862 
F-statistic: 61.92 on 4 and 35 DF,  p-value: 2.173e-15

Linear mixed models

Linear regression assumes all observations are independent, this assumption is violated when observations are grouped.

Ignoring the grouping structure can lead to:

  • Misleading conclusions
  • Incorrect standard errors
  • Inflated type I error rates

A linear mixed model accounts for the grouping structure by including random effects.

Grouping structures

Grouped observations are very common in the life sciences;

  • Repeated measures Multiple measurements from the same subject.
  • Longitudinal Studies Measure of the same subject measures at multple time points.
  • Nested Designs Measures of mice within cages within labs.
  • Multi-omics Studies Omics data (genomics, transcriptomics, proteomics, etc.) from the same individual
  • Experimental Designs Technincal repeats (same sample). Measurements from different batches, labs, regions etc.

Mixed effects models

Fixed effects

  • Population level effects
  • Estimated explicitly

Random effects

  • Account for variation between groups (subjects, batches, etc.)
  • Assumes there is a distribution of effect sizes across groups.
  • Group effects are not estimated individually, instead the variance is estimated

Grouping structure

The grouping structure can be cohorts, batches, subjects, schools, hospital, doctor, cage, lab etc.

Should be determined from the study design and not inferred from the data.

Note: The grouping is always categorical.

Model with fixed and random effects

Once the grouping is decided, we can decide what effects are fixed and what effects are random.

The effects can be;

  • Intercept
  • Slope
  • Interaction

Model with fixed and random effects

Example: Concentration and age

Model the association between plasma concentration and age.

The grouping structure here could be e.g. different clinics.

Model as fixed or random effect

We believe that;

  • the effect of age on concentration is the same across clinics, hence fixed effect
  • the intercept (baseline concentration) varies between clinics

Option for intercept

  1. Include group as a fixed effect to explicitly estimate each group’s baseline. This consumes more degrees of freedom, but would allow us to compare clinics.
  2. Model intercept as a random effect across groups. Estimate variance over groups, but not the intercept for each group. This is more parsimonious and allows us to focus on the overall effect of age without estimating each clinic’s baseline concentration.

Model with random intercept

\[y_{ij} = \beta_0 + \beta_1 x_{ij} + b_{0i} + \varepsilon_{ij}\]

  • \(y_{ij}\): outcome, dependent variable (concentration)
  • \(\beta_0, \beta_1\): fixed effects (intercept and slope)
  • \(b_{0i}\): random intercept per group
    • \(b_{0i} \sim N(0, \sigma_b)\), where \(\sigma_b\) is the standard deviation of the random intercept
  • \(\varepsilon_{ij}\): residual error

Model with random intercept

Mixed model in R

library(lme4)

mm <- lmer(conc ~ age + (1 | group), data = df)

age: Fixed effect

(1 | group): Random intercept for group

Mixed model in R

library(lme4)

mm <- lmer(conc ~ age + (1 | group), data = df)
summary(mm)
Linear mixed model fit by REML ['lmerMod']
Formula: conc ~ age + (1 | group)
   Data: df

REML criterion at convergence: 304.1

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.75481 -0.89181  0.09125  0.70848  1.85339 

Random effects:
 Groups   Name        Variance Std.Dev.
 group    (Intercept) 2224.44  47.164  
 Residual               85.47   9.245  
Number of obs: 40, groups:  group, 4

Fixed effects:
            Estimate Std. Error t value
(Intercept) 241.3334    26.0615   9.260
age          -1.8293     0.3014  -6.068

Correlation of Fixed Effects:
    (Intr)
age -0.422

Random slope

If you believe that the slope actually varies between groups, include random slope for age in addition to random intercept.

Random slope in R

mm2 <- lmer(conc ~ age + (1 + age | group), data = df)
summary(mm2)
Linear mixed model fit by REML ['lmerMod']
Formula: conc ~ age + (1 + age | group)
   Data: df

REML criterion at convergence: 302.9

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.5822 -0.8157  0.2107  0.7805  1.6553 

Random effects:
 Groups   Name        Variance  Std.Dev. Corr 
 group    (Intercept) 4113.8342 64.1392       
          age            0.3925  0.6265  -0.67
 Residual               77.1173  8.7816       
Number of obs: 40, groups:  group, 4

Fixed effects:
            Estimate Std. Error t value
(Intercept) 249.2426    34.0378   7.323
age          -1.9862     0.4373  -4.542

Correlation of Fixed Effects:
    (Intr)
age -0.682

Predicting random effects

The random effects can be predicted using the ranef() function.

(Intercept)         age 
 241.333393   -1.829255 
$group
  (Intercept)
1    58.13116
2   -50.26973
3   -22.21498
4    14.35354

with conditional variances for "group" 

The overall mean is given by the fixed effects, and the random effects are the deviations from this mean for each group.